-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imaihal very clean interface. High level question before I go into the details of the PR; how are the output of the ONNX.Parallel
defined? Is the first output the output of the first fork, and the second output the output of the second fork?
Because in mlir, parallel
already exists in thescf
dialect as the "equivalent" of omp parallel for
, maybe we should name this construct ONNX.ParallelForks
? Just to avoid future conflicts if we were to introduce a parallel loop.
In term of OMP, the provided functionality here is the closest to the omp parallel sections
with omp section
inside. So that could be an alternative naming scheme (namely ONNX.ParallelSections
and ONNX.Section
).
@AlexandreEichenberger Thanks for your comments!
I didn't consider it. The output order of Regarding naming, current |
The way the values of each "yield" gets reflected into the output of the "parallel" needs to be defined in the text associated with the operations. If the current approach is that the "first" fork/yield produces the "first" output value of the "parallel", and the "second" fork/yield generates the "second" output value of the parallel, that is perfectly fine. We just need to make sure the fork are not reordered (I am sure they are not), and clearly explain this ordering in the definition/example associated with the new operations. |
Thanks for proposing these operations. I have some high-level comments.
There is a subtle difference between It looks to me adding |
The other structural ONNX ops, such as ONNX.IF and ONNX.Loop, can read value from outside, but all the generated values can be used outside only through yield. Do the new ops have the same behavior? |
To me, the parallel op is a scheduling decision, and should be added in a later phase. They should be not be in onnx dialect, if there is another choice. By the way, can we use omp.sections for this purpose? |
I updated the description for the operations. Thanks. |
Yes. The ONNXParallel and ONNXFork have the same behavior.
We assume these operations are inserted just before ONNXToKrnl pass after applying optimizations in ONNX IR.(Currently we are using a rewriting tool for testing. We generate ONNXIR using -EmitONNXIR, then insert ONNXParallelOp and ONNXForkOp using the tool). I think omp.section is memref-level operation. We wanted tensor-level operations to parallelize in tensor level. Currently in PR #2756, we used krnl.parallel in memre-level reusing existing openMP parallelization, but we should be able to lower these operations into omp.section. |
This PR introduces new operations for Operator-level parallelization. Overall plan for implementation of Operator-level parallelization is described in issue #2811 . This PR is the first PR.
ONNXParallelOp
ONNXForkOps
should be included in the body region. The threads created by ONNXForkOp are synchronized with main thread at the end of this operation. The results to be used in following operations need to be set as the results of this operation by using ONNXYieldOp.ONNXForkOp
body
region of the operation.